Parsing

🏠 ⮐ Artificial Intelligence ⮐ Machine Learning ⮐ Supervised Learning ⮐ Structured Prediction ⮐

Core Concept

Parsing is the task of predicting a syntactic structure over a sentence—typically a tree that represents constituency (phrase structure) or dependency (head–dependent relations). The output is a rooted tree whose nodes or arcs are labeled (e.g. NP, VP, or subject, object). Valid outputs must satisfy grammatical constraints (e.g. well-formed brackets, single head per word), so parsing is both a structured prediction problem and a constrained decoding problem. Models are trained on treebanks (sentences with gold trees) and evaluated by how well predicted trees match gold (attachment scores, bracket F1). Parsing supports downstream tasks such as information extraction, question answering, and machine translation.

Key Characteristics

Tree-shaped output – Output is a tree (constituency: nested phrases; dependency: directed arcs from heads to dependents). The space of valid trees is defined by a grammar or dependency rules, so decoding must produce only valid structures.
Global structure – Decisions are globally constrained (e.g. every word has exactly one head in dependency parsing; brackets must nest in constituency). Local classifiers or scorers are combined with dynamic programming or transition systems to enforce validity.
Two main formalisms – Constituency parsing produces phrase-structure trees (e.g. NP, VP, S); dependency parsing produces trees of head–dependent arcs, one head per word (except root). Conversion between them is possible but not trivial.
Decoding – Constituency: CKY or chart parsing with a context-free grammar; dependency: dynamic programming (e.g. Eisner) for projective trees, or transition-based (shift-reduce, arc-standard) for general trees. Neural parsers often use transition-based decoding with learned policies.

Common Applications

Constituency (phrase-structure) parsing – Producing nested phrase brackets and nonterminal labels (e.g. Penn Treebank style); used in grammar checking, extraction of phrase-level semantics.
Dependency parsing – Producing head–dependent arcs and relation types; used in many NLP pipelines (relation extraction, semantic role labeling, MT).
Semantic parsing – Mapping sentences to meaning representations (e.g. logical form, AMR); output may be a tree or graph.
Code parsing – Producing ASTs (abstract syntax trees) for source code; supports refactoring, analysis, and code generation.

Parsing Algorithms
Parsing algorithms differ in the formalism (constituency vs dependency), the decoding strategy (dynamic programming vs transition-based vs chart parsing), and whether they use handcrafted grammars or learned scoring (including neural). Choice depends on language, treebank, and whether interpretability (grammar) or accuracy (data-driven) is prioritized.

CKY (Cocke–Kasami–Younger) – Dynamic programming for context-free constituency parsing; fills a chart with nonterminals over spans; runs in (O(n^3 \cdot |G|)) for sentence length (n) and grammar size (|G|). Requires a grammar in Chomsky normal form; exact decoding for the grammar.

Based on: Context-free grammar + chart parsing

Method Group: Grammar-based parsing

https://en.wikipedia.org/wiki/CYK_algorithm

Chart parsing (Earley, etc.) – General chart-based algorithms for constituency parsing with arbitrary CFGs; maintain partial parses and extend incrementally. Flexible with respect to grammar form; used in classic NLP and grammar development.

Based on: Context-free grammar + chart

Method Group: Grammar-based parsing

https://en.wikipedia.org/wiki/Earley_parser

Eisner algorithm – Dynamic programming for projective dependency parsing; finds the maximum spanning tree (MST) or highest-scoring projective dependency tree in (O(n^3)). Used with arc-factored (or higher-order) scorers; exact for projective trees.

Based on: Arc-factored scoring + projective MST

Method Group: Graph-based dependency parsing

https://aclanthology.org/J96-2002/

Transition-based dependency parsing – Build the dependency tree with a sequence of actions (e.g. shift, left-arc, right-arc); a classifier or scorer predicts the next action. No explicit grammar; handles non-projective trees with a stack; fast and widely used (e.g. MaltParser, Stanford parser).

Based on: State machine (stack + buffer) + action classifier

Method Group: Transition-based parsing

https://universaldependencies.org/

Neural transition-based parser – Transition system as above but with neural network (BiLSTM or transformer) for state representation and action scoring. Trained on treebank; achieves state-of-the-art with less feature engineering.

Based on: Transition-based parsing + neural scorer

Method Group: Neural parsing

https://spacy.io/usage/linguistic-features#dependency-parse

Neural chart / constituency parser – Chart-based constituency parsing with neural scoring of spans or rules (e.g. span labeling, or rule scores in a PCFG). Decoding with CKY or chart; parameters learned from treebank. Can be combined with pre-trained encoders (BERT, etc.).

Based on: CKY / chart + neural span or rule scorer

Method Group: Neural parsing

https://github.com/nikitakit/self-attentive-parser

Head-driven phrase structure grammar (HPSG) / LFG parsers – Parsers for rich constraint-based grammars (HPSG, LFG); output is a detailed syntactic–semantic structure. Used in grammar engineering and deep linguistic analysis; typically chart-based with handcrafted grammars.

Based on: Constraint-based grammar + chart

Method Group: Grammar-based parsing